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Abstract 

Gaussian Markov random fields (GMRFs) are frequently used as computationally efficient models 
in spatial statistics. Unfortunately, it has traditionally been difficult to link GMRFs with the more 
traditional Gaussian random field models as the Markov property is difficult to deploy in continuous 
space. Following the pioneering work of Lindgren et al. (2011), we expound on the link between 
Markovian Gaussian random fields and GMRFs. In particular, we discuss the theoretical and practical 
aspects of fast computation with continuously specified Markovian Gaussian random fields, as well as 
the clear advantages they offer in terms of clear, parsimonious and interpretable models of anisotropy 
and non-stationarity. 



1 Introduction 

From a practical viewpoint, the primary difficulty with spatial Gaussian models in applied statistics is 
dimension, which typically scales with the number of observations. Computationally speaking, this is a 
disaster! It is, however, not a disaster unique to spatial statistics. Time series models, for example, can 
suffer from the same problems. In the temporal case, the ballooning dimensionality is typically tamed 
by adding a conditional independence, or Markovian, structure to the model. The key advantage of 
the Markov property for time series models is that the computational burden then grows only linearly 
(rather than cubically) in the dimension, which makes inference on these models feasible for long time 
series. 

Despite its success in time series modelling, the Markov property has had a less exalted role in 
spatial statistics. Almost all instances where the Markov property has been used in spatial modelling 
has been in the form of Markov random fields defined over a set of discrete locations connected by a 
graph. The most common Markov random field models are Gaussian Markov random fields (GMRFs), 



in which the value of the random field at the nodes is jointly Gaussian (Rue and Held 2005). GMRFs 
are typically written as 

x -N^Q- 1 ), 

where Q is the precision matrix and the Markov property is equivalent to requiring that Q is sparse, 
that is Qij = iff X{ and Xj are conditionally independent ( Rue and Held 



2005) 



As problems in spatial statistics are usually concerned with inferring a spatially continuous effect 
over a domain of interest, it is difficult to directly apply the fundamentally discrete GMRFs. For this 
reason, it is commonly stated that there are two essential fields in spatial statistics: the one that uses 
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GMRFs and the one that uses continuously indexed Gaussian random fields. In a recent read paper, 
Lindgren et al. (2011) showed that these two approaches are not distinct. By carefully utilising the 



continuous space Markov property, it is possible to construct Gaussian random fields for which all 
quantities of interest can be computed using GMRFs! 

The most exciting aspect of the Markovian models of Lindgren et al. (2011) is their flexibility. There 
is no barrier — conceptual or computational — to extending them to construct non-stationary, anisotropic 
Gaussian random fields. Furthermore, it is even possible to construct them on the sphere and other 
manifolds. In fact, Simpson et al. (2011a) showed that there is essentially no computational difference 
between inferring a log-Gaussian Cox process on a rectangular observation window and inferring one 
on a non-convex, multiply connected region on the sphere! This type of flexibility is not found in any 
other method for constructing Gaussian random field models. 

In this paper we carefully review the connections between GMRFs, Gaussian random fields, the 
spatial Markov property and deterministic approximation theory. It is hoped that this will give the 
interested reader some insight into the theory and practice of Markovian Gaussian random fields. In 
Section [2] we briefly review the practical computational properties of GMRFs. Section [3] we take a 
detailed tour of the theory of Markovian Gaussian random fields. We begin with a discussion of the 
spatial Markov property and show how it naturally leads to differential operators. We then present a 
practical method for approximating Markovian Gaussian random fields and discuss what is mean by a 
continuous approximation. In particular, we show that deterministic approximation theory can provide 
essential insights into the behaviour of these approximations. We then discuss some practical issues 
with choosing sets of basis functions before discussing extensions of the models. Finally we mention 
some further extensions of the method. 



2 Practical computing with Gaussian Markov random fields 

Gaussian Markov random fields possess two pleasant properties that make them useful for spatial 
problems: they facilitate fast computation for large problems, and they are quite stable with respect to 
conditioning. In this section we will explore these two properties in the context of spatial statistics. 

2.1 Fast computations with Gaussian Markov random fields 

As in the temporal setting, the Markovian property allows for fast computation of samples, likelihoods 



and other quantities of interest (Rue and Held, 2005). This allows the investigation of much larger 



models than would be available using general multivariate Gaussian models. The situation is not, 
however, as good as it is in the one dimensional case, where all of these quantities can be computed 
using 0{n) operations, where n is the dimension of the GMRF. Instead, for the two dimensional spatial 
models, samples and likelihoods can be computed in 0(n 3 / 2 ) operations, which is still a significant 
saving on the C(n 3 ) operations required for a general Gaussian model. A quick order calculation shows 
that computing a sample from an n-dimensional Gaussian random vector without any special structure 
takes the same amount of time as computing a sample from GMRF of dimension n = h 2 l 

The key object when computing with GMRFs is the Cholesky decomposition Q = LL T , where L 
is a lower triangular matrix. When Q is sparse, its Cholesky decomposition can be computed very 



efficiently (see, for instance, Davis, 2006). Once the Cholesky triangle has been computed, it is easy to 
show that x = fj, + L~ T z is a sample from the GMRF x ~ N(fi, Q^ 1 ) where z ~ N(0, 1). Similarly, 
the log density for a GMRF can be computed as 

n n 1 

logvr(a;) = -- log(2vr) + ^ log La - -{x - fi) T Q(x- fi), 
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where La is the ith diagonal element of L and the inner product (x — fi) T Q(x — fi) can be computed in 
0(n) calculations using the sparsity of Q. It is also possible to use the Cholesky trian gle L to compute 
diag(Q _1 ), which are the marginal variances of the GMRF (Rue and Martino, 2007). 



Furthermore, it is possible to sample from x conditioned on a small number of linear constraints 
x\Bx = b, where B G M fcxn is usually a dense matrix and the number of constraints, k, is very small. 
This occurs, for instance, when the GMRF is constrained to sum to zero. However, if one wishes 
to sample conditional on data, which usually corresponds to a large number of linear constraints, the 
methods in the next section are almost always significantly more efficient. While direct calculation of the 
conditional density is possible, when B is a dense matrix conditioning destroys the Markov structure of 
the problem . It is still, however, p ossible to sample efficiently using a technique known as conditioning 
by Kriging (Rue and Held 2005), whereby an unconditional sample x is drawn from A^(/i,Q _1 ) and 
then corrected using the equation 



x 



x 



Q 



- l B T ( 



BQ 1 B T ) 1 (Bx — b). 



When k is small, the conditioning by Kriging update can be computed efficiently from the Cholesky 
factorisation. We reiterate, however, that when there are a large number of constraints, the conditioning 
by Kriging method will be inefficient, and, if B is sparse (as is the case when conditioning on data), it 
is usually better use the methods in the next subsection. 

An alternative method for conditional sampling can be constructed by noting that the conditioning 
by Kriging update x* = x — Sx can be computed by solving the augmented system 



Q B T 
B 



Sx 

y 



o 

Bx-b 



(1) 



where y is an auxiliary variable. A standard application of Sylvester's inertia theorem (Theorem 8.1.17 
in Golub and van Loan, 1996) shows that the matrix in Q is not positive definite, however the system 
can still be solved using ordinary direct (or iterative) methods (Simpson et al. 2008). The augmented 



system ([!]) reveals the geometric structure of conditioning by Kriging: augmented systems of equations 



of this form arise from the Karush-Kuhn- Tucker conditions in constrained optimisation (Benzi et al. 
2005). In particular, the conditional sample solves the minimisation problem 



x* =arg min -II x* 
x*eR™ 2 



x 



subject to Bx* = b, 

where x is the unconditional sample and WvWq = y T Qy- That is, the conditional sample is the closest 
vector that satisfies the linear constraint to the unconditional sample when the distance is measured in 
the natural norm induced by the GMRF. 

The methods in this section are all predicated on the computation of a Cholesky factorisation. 
However, for genuinely large problems, it may be impossible to compute or store the Cholesky factor. 



With this in mind, a suite of methods were developed by Simpson et al. (2007) based on modern iterative 



methods for solving sparse linear systems. These methods have shown some promise for large problems 



(Strickland et al. 



density (Simpson 



2009 



2011 Aune et al. 2011), although more work is needed on approximating the log 



Aune and Simpson, 2011). 



2.2 The effect of conditioning: fast Bayesian inference 

The second appealing property of GMRFs is that they behave well under conditioning. The discussion 



in this section is intimately tied to discretely specified GMRFs, however we will see in Section 3.6 that 



3 



the formulation below, and especially the matrix A, has an important role to play in the continuous 
setting. Consider the simple Bayesian hierarchical model 



y\x ~ N(Ax, Q 



where A, Q x and Q y are sparse matrices. A simple manipulation shows that (x 1 , y 
Gaussian Markov random field with joint precision matrix 

(Q x + A T Q y A -A T Q y 

V ~Qv A Qy 
and the mean defined implicitly through the equation 



(2a) 
(2b) 

' r is jointly a 

(3) 



Qxv^ 



xyr^xy 



o 



As (x T ,y T ) T is jointly a GMRF, it is easy to see (see, for example, 



Rue and Held 



2005) that 



x\y ~ N (fi + (Q a 



A T Q V A) 



L A T QJy - An), (Q x + A T Q y A)~ 1 ) 



(4) 



It is important to note that the precision matrices for the joint ([3]) and the conditional Q distributions 
are only sparse — and the corresponding fields are only GMRFs — if A is sparse. This observation directly 
links the structure of the matrix A to the availability of efficient inference methods and will be important 
in the coming sections. 

For practical problems in spatial statistics, the model is not enough: there will be unknown 
parameters in both the likelihood and the latent field. If we group the unknown parameters into a 
vector 6, we get the following hierarchical model 

y\x, ~ N(Ax, QyiOy 1 ) (5a) 

x \e ~N{n,Q x {oy l ) (5b) 
e ~ tt(0). (5c) 

In order to perform inference on ([5]), it is common to use Markov chain Monte Carlo (MCMC) methods 
for sampling from the posterior ir(x, 0\y), however, this is not necessary. It's an easy exercise in Gaussian 
density manipulation to show that the marginal posterior for the parameters, denoted n(6\y) can be 
computed without integration and is given by 

tt(x, y\0)ir(0) 
n(x\y,G) 

where x* can be any point, but is typically taken to be the conditional mode E(x\y,0), and the 
corresponding marginals n(6j\y) can be computed using numerical integration. Similarly, the marginals 
ir(xi\y) can be computed using numerical integration and the observation that, for every 9, ir(x,y\G) 



ir(0\y) oc 



is a GMRF (Rue and Martino, 2007 Rue et al. 2009) 



It follows that, for models with Gaussian observations, it is possible to perform deterministic in- 
ference that is exact up to the error in the numerical integration. In particular, if there are only a 
moderate number of parameters, this will be extremely fast. For non-Gaussian observation processes, 



exact deterministic inference is no longer possible, however, Rue et al. (20091 showed that it is possible 
to construct extremely accurate approximate inference schemes by cleverly deploying a series of Laplace 
approximation. The integrated nested Laplace approximation (INLA) has been used successfully on a 



large number of spatial problems (see, for example, 


Fong et al. 


2010 


Akerkar et al. 


Held 


2011 Riebler et al. 


201 1| 


Cameletti et al. 


2011 


Illian et al. 


2011 ) and a user 



is available from http://r-inla.org 



2010; Schrodle and 
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Figure 1: An illustration of the spatial Markov property. If, for any appropriate set S, {x(s) : s G ^4)} 
is independent of {x(s) : s G B)} given {x(s) : s G S)}, then the field x(s) has the spatial Markov 
property. 



3 Continuously specified, Markovian Gaussian random fields 



One of the primary aims of spatial statistics is to infer a spatially continuous surface x(s) over the 
region of interest. It is, therefore, necessary to build probability distributions over the space of func- 
tions, and the standard way of doing this is to construct Gaussian random fields, which are the 
generalisation to functions of multivariate Gaussian distributions in the sense that for any collec- 
tion of points (si,S2, 
x = (x(si),x(s 2 ), .... 



the field evaluated at those points is jointly Gaussian. In particular 
(s p )) T ~ N([i,Yj), where the covariance matrix is given by £y = c(si,Sj) for 
some positive definite covariance function c(-, •). In most commonly used cases, the covariance function 
is non-zero everywhere and, as a result, X is a dense matrix. 

It is clear that we would like to transfer some of the pleasant computational properties of GMRFs, 
which are outlined above, to the Gaussian random field setting. The obvious barrier to this is that 



classical GMRF models are strongly tied to discrete sets of points, such as graphs (Rue and Held, 2005) 



and lattices (Besag, 1974). Throughout this section, we will discuss the recent work of Lindgren et al. 



(2011) that has broken down the barrier between GMRFs and spatially continuous Gaussian random 



field models. 



3.1 The spatial Markov property 



For temporal processes, defining the Markov property is greatly simplified by the structure of time: its 
directional nature and the clear distinction between past, present and future allow for a very natural 
discussion of neighbourhoods. Unfortunately, space is far less structured and, as such the Markov 
property is harder to define exactly. Intuitively, however, the definition is generalised in an the obvious 
way and is demonstrated by Figure [T] Informally, a Gaussian random field x(s) has the spatial Markov 
property if, for every appropriate set S separating A and B, the values of x(s) in A are conditionally 
independent of the values in B given the values in S. A formal definition of the spatial Markov property 



can be found in Rozanov (1977). 



It is not immediately obvious how the spatial Markov property can be used for computational 
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inference. However, in an almost completely ignored paper, Rozanov (1977) provided the vital charac- 
terisation of Markovian Gaussian random fields in terms of their power spectra. The power spectrum of 
a stationary Gaussian random field is defined as the Fourier transform of its covariance function c(h), 
that is 



R(k) = 



1 



(2vr) c 



exp(—ik T h)c(h). 



Rozanov showed that a stationary field is Markovian if and only if R(k) 
positive, symmetric polynomial. 



l/p(k), where p(k) is a 



3.2 From spectra to differential operators 

While Rozanov's characterisation of stationary Markovian random fields in terms of their power spectra 
is elegant, it is not obvious how we can turn it into a useful computational tool. The link comes from 
Fourier theory: there is a one-to-one correspondence between polynomials in the frequency space and 
differential operators. To see this, define the covariance operator C of the Markovian Gaussian random 
field as the convolution 

C[f(-)](h)= [ c(h'-s)f(h')dh' 
= [ exp(ik T h)^dk, 

where p(k) = 1/R(k) = ^|i|<£ a «^* ^ s a d-variate, positive, symmetric polynomial of degree I; i = 
{h,...,i d ) T G N d is a multi-index, meaning that fc* = Ui=i k i and H = £jLi*i; /(') 

is a smooth 

function that goes to zero rapidly at infinity; and f{k) is the Fourier transform of f(h). It follows from 
Fourier theory that the covariance operator C has an inverse, which we will call the precision operator 
Q, and it is given by 

Q[f(-)](h) = C- 1 [f(-)](h)= [ eMik T h)f(k)p(k)dk 

= <HD*f(h), 
\i\<£ 

where D l = — -, — r~ are the appropriate multivariate derivatives. 

dh?dh l j-dh l * 

The following proposition summarises the previous discussion and specialises the result to isotropic 
fields. 

Proposition 1. A stationary Gaussian random field x(s) defined on M. d is Markovian if and only if its 
covariance operator C [/(•)] has an inverse of the form 

Q[f(-)](h) = ^^D i f(h), 
\i\<e 

where ai are the coefficients of a real, symmetric polynomial p{k) = ^2\i\<£ a ik l ■ Furthermore, x(s) is 
isotropic if and only if 

t 
i=0 

where A = Yli=i q%z ^ s ^ e d-dimensional Laplacian andp(t) = Ylt=o ^* ^ s a rea h symmetric univariate 
polynomial. 
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The critical point of the above paragraph is that for Markovian Gaussian random fields, the co- 
variance is the inverse of a local operator, in the sense that the value of Q [/(•)] (h) only depends on 
the value of f(h) in an infinitesimal neighbourhood of h. This is in stark contrast to the covariance 
operator, which is an integral operator and therefore depends on the value of f{h) everywhere that the 
covariance function is non-zero. It is this locality that will lead us to GMRFs and the promised land of 
sparse precision matrices and fast computations. 



3.3 Stochastic differential equations and Matern fields 



The discussion in the previous subsection gave a very general form of Markovian Gaussian random 
fields. In this section we will, for the sake of sanity, simplify the setting greatly. Consider the stochastic 
partial differential (SPDE) 

(«2 _ A) a/2 n 



X[S) 



W(s), (6) 
where W(s) is Gaussian white noise and a > 0. The solution to the SPDE will be a Gaussian random 



field and some formal manipulations (made precise by Rozanov, 1977) show that the precision operator 
is 



Q = [E{xx*)]- 1 
= {k 2 - A) a . 



(k 2 - A)- a / 2 E(WW*)(K 2 - A) 



-a/2 



It follows that Q is a local differential operator when a is an integer and, by Proposition [TJ the stationary 
solution to ^ for integer a is a Markovian Gaussian random field. 

Somewhat surprisingly, we have now ventured back into the more common parts of spatial statistics: 
Whittle (1954, 1963) showed that, for any a > d/2, the solution to ([6]) has a Matern covariance function. 
That Matern family of covariance functions, which is given by 



2 

c a*,u,M = 2 v- a i T{u) (4h\\rK u (K\\h\\), 



where a 2 is the variance parameter, k controls the range, and v = a — <i/2 > is the shape parameter, 
is one of the most widely used families of stationary, isotropic covariance functions. The results of the 
previous section show that, when v + d/2 is an integer, the Matern fields are Markovian. 



3.4 Approximating Gaussian random fields: the finite element method 



Having now laid all of the theoretical groundwork, we can consider the fundamental question in this 
paper: how do can we use GMRFs to approximate a Gaussian random field? In order to construct 
a good approximation, it is vital to remember that every realisation of a Gaussian random field is a 
function. It follows that any sensible method for approximating Gaussian random fields is necessarily 
connected to a method for approximating an appropriate class of deterministic functions. Therefore, 



following Lindgren et al. (2011), we are on the hunt for simple methods for approximating functions. 



A reasonably simple method for approximating continuous functions is demonstrated in Figure [2j 
which shows a piecewise linear approximation to a deterministic function f(s) that is defined over a 
triangulation of the domain. This approximation is of the form 



f(s) ~ fh(s) = ^Wifais) 



i=l 
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(a) A continuous function 



(b) A piecewise linear approximation 



Figure 2: Piecewise linear approximation of a function over a triangulated mesh. 



where the basis function <pi(s) is the piecewise linear function that is equal to one at the ith vertex of 
the mesh and is zero at all other vertices and the subscript h denotes the largest triangle edge in the 
mesh and is used to differentiate between piecewise linear functions over a mesh and general functions. 
The grey pyramid in Figure [2] is an example of a basis function. With this in hand, we can define a 
piecewise linear Gaussian random field as 

n 

Xh{s) = ^2wi(/>i(s), (7) 
i=i 

where 4>i(s) are as before and the weights w are now jointly a Gaussian random vector. 

Our aim is now to find the statistical properties of w that make Xh(s) approximate the Markovian 
Matern fields. In order to do this, we use the characterisation of a Matern field as the stationary solution 
to ([6]). For simplicity, let us consider a = 2 on a domain Q C R 2 . Any solution of Q also satisfies, for 
any suitable function ip(s), 

[ ^(s)(k 2 - A)x(s)ds= [ ip(s)W(ds) 
Jn Jn 

and an application of Green's formula leads to 



K?tl>(s)x(s) + Vip(s) ■ Vx(s) dx = / ^j(s)W(ds), (8) 
n Jn 

where the integrals on the right hand sides are integrals with respect to white nose (see Chapter 5 of 



Adler and Taylor, 2007) and the second line follows from Green's formula and the (new) condition that 
the normal derivative of x(s) vanishes on the boundary of O. Furthermore, if we find x(s) such that 
^ holds for any sensible ip(s), then x(s) is the weak solution to ([6]) and it can be shown that it is a 
Gaussian random field with the appropriate Matern covariance function. 

Unfortunately, we cannot test ^ against every function ip(s), so we will instead chose a finite set 
{tpj(s)}j' = i to test against. Substituting Xh(s) into Q and testing against this set of ^s, we get the 
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system of linear equations 

y~] I k 2 / ifjj(s)(j)i(s) ds + / Vtpj(s) ■ V(f>i(s) ds I Wi 
~1 \ Jn Jn J 



ipj(s)dW(s) 



, n. 



(9) 



Finally, we chose our test functions ipj(s) to be the same as our basis functions <f>i(s) and arrive at 
the Galerkin finite element method. For piecewise linear functions over a triangular mesh, it is easy to 
compute all of the integrals on the left hand side of ^ . The white noise integral on the right hand side 
can be computed and it is Gaussian with mean zero and 



Cov 



</>i(s)dW{s), / 4> j (s)dW(s) 
n Jn 



4>i(s)<j)j(s) ds. 



(10) 



We can, therefore write ([9]) in matrix form as 

Kw 



N(0,C), 



where K = kC+G and the matrices are given by Cij = j n <f>i(s)<f>j(s) ds and G%j = j n V^j(s)-V0i(s) ds. 

A quick look at the definitions of C and G shows that, due to the highly local nature of the basis 
functions, these matrices are sparse. Unfortunately, C is a dense matrix and, therefore w will not 



be a GMRF. However, replacing C by the diagonal matrix C 
essentially the same result numerically ( |Bolin and Lindgren 



rate of convergence of the approximation (Appendix C.5 of 
that the solution to 

Kw ~ N(0,C) 

is a GMRF with zero mean and sparse precision matrix Q = 



diag( J n <j)i(s)ds,i = l,...,n) gives 



2009 


) and can b 


Lind 


gren et al. 



K T CK. The GMRFs that correspond to 



other integer values of a can be found in Lindgren et al. (2011). 



3.5 A continuous approximation to a continuous random field 

We have now derived a GMRF w from the continuous Matern field x(s). It is, therefore, reasonable to 
wonder how well w approximates x(s). It doesn't! The GMRF w was defined as the weights of a basis 
function expansion ([7]) and only make sense in this context. In particular, w is not trying to approximate 
x(s) at the mesh vertices. Instead, the piecewise linear Gaussian random field Xh(s) = ^2^=1 Wi4>i(s) 
tries to approximate the true random field everywhere. Because of the nature of this continuous ap- 
proximation, Xh(s) will necessarily overestimate the variance at the vertices and underestimate in in 
the centre of the triangles. 

One of the real advantages of using piecewise linear basis functions is that a lot of work has gone into 



working out their approximation properties ( Brenner and Scott 2007 ) . We can leverage this information 



to get hard bounds on the convergence of Xh{s) to x(s). The following theorem is a simple example of 
this type of result. It shows that functionals of both the field and its derivative converge to the true 
functionals and the error is 0(h), where h is the length of the largest edge in the mesh. The theorem also 
links the convergence of functionals of the random field to how well the piecewise linear basis functions 
can approximate functions in the Sobolev space H , which consists of square integrable functions f(s) 
for which \\f\\ 2 Hl = k 2 f a f{s) 2 ds + / n V/(s) • V/(s) ds is finite. 

Theorem 1. Let L = k 2 — A. Then, for any f £ H 1 , 

E U f(s)L(x(s) - x h (s)) ds} < ch 2 \\f\\ 2 Hl , 
where c is a constant and h is the size of the largest triangle edge in the mesh. 
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Proof. Let / G H , fh{s) be the if ^projection of / onto the finite element space Vh = span{c/>i(s), . . . , <p n (t 
that is let fh be the solution to 



It follows that 



f(s)Lx h (s)ds 



min ||/ - f h \\ H i. 

Jh&Vh 



(f(s) - fh(s))Lx h (s)ds + / f h (s)Lx h (s)ds 
n Jn 



f h {s)Lx h {s)dx 
f h (s)dW(s), 



where the second equality follows from the Galerkin property of Xh(s) (Brenner and Scott, 2007), which 



states that j n g(s)Lxh(s) ds = whenever g(s) is in the orthogonal complement of Vh with respect to 
the if 1 inner product. It follows directly that 

f(s)L(x(s)-Xh(s))ds= [ (f(s)-f h (s))dW(s). 
a Jn 

Therefore, it follows from the properties of white noise integrals that 

El U(s)L(x(s)-x h (s))ds\ =E(J(f(s)-f h (8))dW(a) 

(f(s)-f h (s)) 2 ds. 



It follows from Theorem 4.4.20 in Brenner and Scott (2007) that (under some suitable assumptions on 
the triangulation) 



\\fn-f\\mn)<ch\\f\\ Hl 



□ 



The key lesson from the proof of Theorem [T] is that the convergence of functionals of Xh{s) depends 
solely on how well the basis functions can approximate a fixed H 1 function. Therefore, it is vital to 
consider the approximation properties of your basis functions! 



3.6 Choosing basis functions: don't forget about A 

While we have computed everything with respect to piecewise linear functions, the methods considered 
above work for any set of test and basis functions for which all of the computations make sense. However, 
we strongly warn against using any of the more esoteric choices. There are two main issues that can 
appear. The first issue is that the wrong choice of basis functions will destroy the Markov structure of 
the posterior model, which will annihilate the computational gains we have worked so hard for. The 
second issue, which is related to Theorem [l] is much more problematic: not all sets of basis functions 
will provide good approximations to x(s). 



In Section 2.2, we looked at the GMRF computations for hierarchical Gaussian models. Consider a 
simple Gaussian observation process of the form 



Vi ~ N(x h (si),a 2 



1,...,N 
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Figure 3: Taken from (Simpson et al. 2011b). The error in the best approximation to x(s) = 1, 
s € [0, 1]. The dotted line is the naive kernel basis, while the solid line is an integrated kernel basis 
derived by Simpson et al. (2011b). The piecewise linear basis (dashed line) reproduces the function 
exactly. 



where the number of data points N does not need to be related to the number of mesh vertices n. It 
follows that the datapoint (si,yi) requires the computation of the sum Y^=i w j4>j( s i)- When the basis 
functions are local, there are only a few non-zero terms in this sum and the corresponding matrix A, 
which has entries Aij = <pj(si), is sparse. For piecewise linear functions on a triangular mesh, each row 
of A has at most 3 non-zero entries. 

On the other hand, consider a basis consisting of the first n functions from the Karhunen-Loeve 
expansion of x(s). In this case, it's easy to show that the precision matrix for w will be diagonal. 
Unfortunately, these basis function are usually non-zero everywhere, so A will be a completely dense 
N x n matrix, which, for large datasets will become the dominant computational cost. 

The approximation properties of sets of basis functions are typically very hard to determine. There 
are, however, some good guiding principles. The first principle is that your basis functions should 
do a good job approximating simple functions. You usually want to be able to at least approximate 
constant and linear functions well. A second guiding principle is that you should be very careful when 
your basis functions depend on a parameter that is being estimated. It is important to check that the 
approximation is still sensible over the entire parameter range. Figure [3] is an example of this problem, 
taken from Simpson et al. (2011b), shows the best approximation to a constant function using the basis 
derived from a convolution kernel approximation to a one dimensional Matern field with v = 3/2 and 
k = 20. The basis functions are computed with respect to a fixed grid of 11 equally spaced knots Sj, 
i = 1, . . . , 11 and are given by 

When n gets large, the basis functions get sharper and the approximation gets worse. At its worst, the 
effective range of the kernel becomes shorter than grid of knots. Therefore, if k becomes large relative 
to the grid spacing, both Kriging estimates and variance estimates will be badly infected by the poor 
choice of basis function. On the other hand, it can be shown that piecewise linear Galerkin finite element 
approximations are stable in the sense that the norm of the approximation can be bounded above by 
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(a) A sample from an anisotropic field 



(b) The covariance functions 



Figure 4: An anisotropic field denned through ^ when the Laplacian is replace with the anisotropic 
variant V- (-D(s)V(rx(s))). The right hand figure shows the approximate covariance function evaluated 
at three points and the non-stationarity is evident. 



an appropriate norm of the true solution. This means that a piecewise linear basis, when used in this 
way, will never display this bad behaviour. 

Another consideration when choosing sets of basis functions is the smoothness of the random field. 
Theorem [T] directly links the convergence of the approximate fields to how well the basis functions can 
approximate a certain class of functions. This will usually be the case. Typically, the smoother the field 
is, the more useful higher order "spectral" bases will be. For a smooth enough field, it may actually be 
cheaper to use a small, well chosen global basis than a large basis full of local functions. 



3.7 Extending the models: anisotropy, drift, and space-time models on manifolds 

One of the main advantages to the SPDE formulation is that it is easy to construct non-stationary 
variants. Non-stationary fields can now be defined by letting the parameters in ([6]), be space-dependent. 
For example, log k can be expanded using a few weighted smooth basis functions 

logfc(a) (11) 

i 

and similar expansions can be used for r. This extension requires only minimal changes to the method 
used in the stationary case. More interestingly, we can also incorporate models of spatially varying 
anisotropy by replacing the Laplacian A in Q with the more general operator 



v.(jjwvw#e^AL(,)A 

i,j=l 1 ^ ■? 



(t(s)x(s)) 



These models can be extended even farther by incorporating a "drift" term, as well as temporal depen- 
dence, which leads to the general model 

Q- t (Tu(s, t)) + (k 2 (tx(s, t)) - V • (DV(tx(s, t)))) + b ■ V(tx(s, t)) = W(s, t), (12) 

where t is the time variable and b is a vector describing the direction of the drift and the dependence 
of k, t, D and b on s and t has been suppressed. The concepts behind the construction of Markovian 
approximations to the stationary SPDE transfer almost completely to this general setting, however 
more care needs to be taken with the exact form of the discretisation (Fuglstad 2010). 
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Figure 5: A sample from a non-stationary, anisotropic random field on the sphere. 



A strong advantage of defining non-stationarity through (|12j) is that the underlying physics is well 
understood. For example the second order term V • (D~V(tx(s, t)))) describes the diffusion, where the 
matrix D(s,t) describes the preferred direction of diffusion at location s and time t. A spatial random 
field with a non-constant diffusion matrix D(s) is shown in Figure |4j Similarly, the first order term 
b ■ V(tx(s, t)) describes an advection effect and b(s, t) can be interpreted as the direction of the forcing 
field. We note that unlike other methods for modelling non-stationarity, the parametrisation in this 



model is unconstrained. This is in contrast to the deformation methods of Sampson and Guttorp ( 1992 ), 



in which the deformation must be constrained to be one-to-one. Initial work on parameterising and 



performing inference on these models has been undertaken by Fuglstad (2011). 



An interesting consequence of defining our models through local stochastic partial differential equa- 
ls replaced by a space that is only 



tions, such as (12), is that the SPDEs still make sense when 



locally flat. We can, therefore, use (12) to define non-stationary Gaussian fields on manifolds, and still 



obtain a GMRF representation. Furthermore, the computations can be done in essentially the same 
way, the only change is that the Gaussian noise is now a Gaussian random measure, and that we need 
to take into account the local curvature when computing the integrals to obtain the solution. Figure [5] 
shows a realisation of a non-stationary Gaussian random field on a sphere, with a model similar to the 
one used in Figure |4j The solution is still explicit, so all elements of the precision matrix, for a fixed 
triangularisation, can be directly computed with no extra cost. The ability to construct computationally 
efficient representations of non-stationary random fields on a manifold is important, for example, when 
modelling environmental data on the sphere. 



3.8 Further extensions: areal data, multivariate random fields and non-integer as 

The combination of a continuously specified approximate random field and a Markovian structure allows 
for the straightforward assimilation of areal and point-referenced data. The trick is to utilise the matrix 



A in the observation equation (5a). The link between the latent GMRF and the point-referenced data is 



as described above. For the areal data, the dependence is through integrals of the random field and these 
integrals can be computed exactly for Xh(s) and the resulting sums go into A. If a more complicated 
functional of the random field has been observed, this can be approximated using numerical quadrature. 



This has been recently applied by Simpson et al. (2011a) to inferring log-Gaussian point processes. 



It is also possible to extend the SPDE framework considered above to a multivariate setting. Pri- 
marily, the idea is to replace a single SPDE with a system of SPDEs. It can be shown (work in progress) 
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that these models overlap with, but are not equivalent to, the multivariate Matern models constructed 
by |Gneiting et al.| ( f2QTo| ). 

The only Markovian Matern models are those where the smoothing parameter is d/2 less than an 
integer. It turns out, however, that we can construct good Markov random field approximations when 
a isn't an integer (see the authors' response in Lindgren et al. 2011). This is essentially a powerful 



extension of the GMRF approximations of Rue and Tjelmeland (2002). We are currently working on 
approximating non-Matern random fields using this method. 



4 Conclusion 



In this paper we have surveyed the deep and fascinating link between the continuous Markov property 
and computationally efficient inference. The material in this paper barely scratches the surface of the 



questions, applications and oddities of these fields (for instance, Bolin (2011 ) uses a similar construction 
for non-Gaussian noise processes). However, we have demonstrated that by carefully considering the 
Markov property, we are able to greatly boost our modelling capabilities while at the same time ensuring 
that the models are fast to compute with. The combination of flexible modelling and fast computations 
ensures that investigations into Markovian random fields will continue to produce interesting and useful 
results well into the future. 
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